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Abstract. - Ranking problem of web-based rating system has attracted many attentions. A 
good ranking algorithm should be robust against spammer attack. Here we proposed a correlation 
based reputation algorithm to solve the ranking problem of such rating systems where user votes 
some objects with ratings. In this algorithm, reputation of user is iteratively determined by 
the correlation coefficient between his/her rating vector and the corresponding objects' weighted 
average rating vector. Comparing with iterative refinement (IR) and mean score algorithm, results 
for both artificial and real data indicate that, the present algorithm shows a higher robustness 
against spammer attack. 



Introduction. — The abundance of available informa- 
. tion troubled people every day, and information filtering 

■ technique is quickly developed in recent years. An impor- 
. tant aspect in information filtering is the rating system. 

There are a range of daily examples of rating system. Such 
systems include opinion websites (Ebay, Amazone, Movie- 
lens, Netflix, etc.), where users evaluate objects. Ranking 
„ is one of the most common way to describe the evaluation 

■ aggregation result, which gives a simple representation of 
the comparative qualities of objects. 

PageRank is the most widely applied algorithm for 
search engines which rank websites based on the directed 
hyperlink graph [l]. Recently, some iterative algorithms 
are used in scientific citation network to rank scientists 
[2]. Both the hyperlink network and scientific citation 
network are unipartite systems, but many other rating 
systems have a bipartite structure with two kinds of node: 
users as evaluators and objects as candidates [3 . In this 
paper, we consider the ranking problem in such rating 
systems where users vote objects with ratings, and devise 
algorithms to accurately rate objects. 

Ranking objects according to their average ratings is a 
straightforward statistical method. However, in the open 
evaluation system, the user can be somebody who are not 
serious about voting, or he/she is not experienced in the 
corresponding field and gives some unreasonable ratings. 
What even worse is that the user might be an evil spam- 
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mer who gives biased ratings on purpose. Therefore, the 
evaluation by simply averaging all ratings may be less ac- 
curate. Building a reputation system for users is a good 
way to solve this problem [4,5 . Users with higher repu- 
tations are assigned more weight. Such reputation mech- 
anisms are widely used in online systems, such as online 
shops IQ, online auctions [7 , Wikipedia [8 , P2P sharing 
networks [9^, etc. 

There are already some ranking algorithms based on 
reputation estimate p!QHT3] . In [El [13], an iterative re- 
finement (IR) algorithm is proposed. A user's reputation 
is inversely proportional to the difference between his/her 
rating vector and the corresponding objects' weighted av- 
erage rating vector. Weighted rating of all objects and 
reputation of all users are recalculated at each step, until 
the change of weighted ratings is less than a certain thresh- 
old between two iteration steps. Kerchove and Dooren 
[11 modify the iterative refinement algorithm by assigning 
trust to each individual rating. In most previous works, 
the influence of spammer attack in rating systems is al- 
ways ignored. 

In this paper, we proposed a correlation based ranking 
algorithm. Reputation of user is determined by the cor- 
relation coefficient between the user's rating vector and 
the corresponding objects' weighted average rating vector. 
By comparing with other algorithms, the effectiveness of 
the correlation based ranking algorithm was tested using 
artificial data. The results show that correlation based 
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ranking algorithm is more robust than other algorithms. 
Finally, we use two distinct real data sets (Movielens and 
Netflix) to evaluate the effectiveness of the algorithm. 

The correlation model. — The rating system we 
considered can be represented by a bipartite network, 
which consists of a set U of users who have each rated 
some subset of the complete set O of objects. We use 
Latin letters for users and Greek letters for objects to dis- 
tinguish them. Consequently rioc denotes the rating given 
by user i to object a. The set of users who rated a given 
object a is denoted by [/q,, while the set of objects rated 
by a user i is denoted by O^. The degree of object a (i.e. 
the number of ratings given to object a) is denoted as ko^ 
and the degree of user i (i.e. the number of ratings given 
by user i) is denoted as kui. 

We use Qoq, to represent the aggregate estimated quality 
of object a, and Cui the reputation of user i. The quality 
of an object depends on the evaluations it received, and 
can be defined as the weighted average of ratings to this 
object: 



rating has not been given (at all t < t), is chosen. 
The rating is determined as 

^icx — Q Oi H~ ^ia,") (5) 

where error is drawn from a probability distribution 
parameterized by user z's error magnitude. Rating r^Q, ly- 
ing out of the range are truncated. To achieve a certain 
sparsity 77 of the resulting data, the total number of gen- 
erated ratings is 77|[/||0| hence {t = 1, 77|[/||0|). 

As evident from the power-law-like distribution of the 
number of ratings given by individual users and received 
by individual objects in the real data sets [3], there should 
be a preferential attachment mechanism in the evolution 
of the rating system [M]. In the real data sets, the more 
ratings a user have given, the higher probability he will 
give a new rating. And the more ratings an object have 
received, the higher probability it will receive a new rat- 
ing. Based on these observations, at each time step t, we 
choose a user-object pair (i,a) using preferential attach- 
ment mechanism. The probabilities of choosing user i and 
object a at time step t are 



(1) 



According to the objects' qualities, the Pearson corre- 
lation coefficient between the rating vector of user i and 
the corresponding objects' quality vector is given by: 



and 



Pi{t) 



kUi{t) 



kOa{t) + 1 



(6) 



(7) 



Qoi 



where and ctqo. are resoectively the standard devia- 
tions of rating vector of user i and the corresponding ob- 
jects' quality vector, and 77 and Qoi are their expected 
values. 

Correlation coefficient is a good way to quantify the 
similarity between two vectors. As a user who has more 
similar ratings to the weighted average ratings should have 
a higher reputation, the reputation of a user i is given 
based on this similarity: 



), (2) where kui{t) and kOa{t) are the degree of user i and object 
a at time step t. As the degrees are all zero at the initial 
time, we have used kui{t) + 1 in the above equations. 

To create artificial data, we set |/7| = 6000, \0\ = 4000 
and T] = 2% (which corresponds to approximately 4.8 x 10^ 
ratings). Objects' qualities and users' ratings are limited 
to the range [0, 1]. Objects' qualities are drawn from the 
uniform distribution [/(0, 1), users' error magnitudes are 
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The resultant object quality is obtained by initially as- 
signing every user's reputation according to his degree as 
Cui = kui/\0\, and then iterating eq. ([U (j2j) and (j3j) until 
the change of the quality estimates 
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is less than a threshold of ^ = 10 ^. 



Results on artificial data. — When creating the ar- 
tificial data, we assume that each user i has a certain mag- 
nitude of rating error 5i{i = 1, |/7|) and each object a 
has a certain true intrinsic quality Qa{(^ = •••7 At 
each time step a user-object pair (i,a), on which the 



drawn from the uniform distribution U {(imim o-max) ^ and 
individual rating errors Cia are drawn from the normal 
distribution N{0^(Ji). We choose dmin = 0.1 and dmax = 
0.5 in the simulation. 

To get a more accurate ranking, a good ranking algo- 
rithm should give higher reputations to the users with 
lower error magnitudes. As the users' error magnitudes are 
continuous, we divide the error magnitude into bins with 
the length 0.01. The mean reputations of users with error 
magnitudes in the same bins are then evaluated. Fig. [T] 
shows the users' mean reputation as a function of error 
magnitude obtained by the correlation based ranking al- 
gorithm. It is clear that the higher the error magnitude of 
the user, the lower the reputation. The correlation coeffi- 
cient is thus a good way to quantify a user's reputation. 

After the convergence of Qo, we use a correlation mea- 
sure called Kendall's tau [15] to judge the ranking result 
of the algorithm. It is defined as 



sign[((5c. - Qi3){Qoa - (8) 
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Fig. 1: The relationship of user's reputation and his error 
magnitude of the correlation based ranking algorithm. 

with the lower bound -1 {i.e. the two rankings are exactly 
opposite) and the upper bound 1 {i.e. the two rankings 
are exactly the same). 

Besides, there is another standard measure in informa- 
tion filtering literature named AUG flBJ. In most cases, 
the true ranking of objects is not available, and it is not 
possible to evaluate the algorithm by r. Instead, we can 
select a group of benchmark objects by some plausible cri- 
teria, and then use AUG to evaluate a ranking algorithm. 
AUG equals one when all benchmark objects are ranked 
higher than the other objects, while AUG=0.5 corresponds 
a completely random ranked object list. In the tests using 
artificial data, 5% of all objects with the highest quality 
values are selected as benchmark objects. 

Using the artificial data, we evaluate the effectiveness of 
the correlation based ranking algorithm. Gomparing with 
straightforward mean algorithm and IR algorithm, table [T] 
shows the ranking result obtained from the artificial data. 
As we can see, in a clean rating system without any spam- 
mer, the effectiveness of the three algorithms are all good 
and do not differ a lot. The IR algorithm relatively has 
the best effectiveness. 

Spam analysis. — In the above simulations, users 
are honest and give ratings with fixed error magnitudes. 
While in the real system, not all users are honest. There 
are many kinds of spammers that may drastically lower 
the effectiveness of ranking algorithms. 

In general, there are two kinds of ratings that a spam- 
mer may give: (1) Random rating: random allowable rat- 
ings on items. (2) Push rating: maximum or minimum 
allowable ratings on items. 

A random rating spammer may be a naughty user who 
just plays around with the informations and gives rat- 
ings which mean nothing. A push rating spammer al- 
ways gives maximum/minimum allowable ratings that also 
mean nothing. These dishonest ratings influence the ac- 
curacy of the ranking result. A good ranking algorithm 
should be robust against any kind of spammers. To evalu- 
ate the correlation based ranking algorithm against differ- 



Table 1: Ranking results of different algorithms for the artificial 
data. 



Algorithm 


Mean 


IR 


Gorrelation-Based 


AUG 


0.9940 


0.9965 


0.9952 


T 


0.9216 


0.9387 


0.9300 



ent types of spammers, some users are randomly selected 
as spammers in the artificial data. These spammers' rat- 
ings are generated according to their spamming types. In 
this paper, we consider two types of spamming: (1) Spam- 
mers who always give random ratings. (2) Spammers who 
always give push ratings. For both types of spamming, 
we study the influence on the effectiveness of the corre- 
lation based ranking algorithm as the ratio of spammers 
increases. For comparison, the effectiveness of mean and 
IR ranking algorithm are also studied. 

Random rating spamming. Fig. [2] shows the effective- 
ness of different algorithms obtained from the artificial 
data with random rating spamming. When there is no 
spammer, the effectiveness of all the three algorithms are 
almost the same. But when the ratio of spammers in- 
creases, the correlation based ranking algorithm is signif- 
icantly better than the others. When all the users are 
spammers, the rankings are random for all algorithm, and 
the value of AUG becomes 0.5 and r becomes 0. 

Gorrelation coefficient is a measure of the strength of the 
linear relationship between two vectors. A random value 
vector normally has little or no correlation with any other 
vectors. Thus, the reputation of random rating spammers 
should be very small. As shown in fig. [3l^a), the reputa- 
tions of most random rating spammers are very low. Even 
when the ratio of spammers is 0.9, there is still more than 
70% of spammers with the reputation less than 0.1. 

While for the honest users, regardless of spammer size, 
their reputations are always high (up to 90% larger than 
0.4, see fig. [3l^b)). The inset in fig. ^h) shows the re- 
lationship of user's reputation and his error magnitude. 
The honest user's reputation is decreasing with his error 
magnitude. When the ratio of spammers is very large, the 
decreasing line has larger fluctuation, but the magnitude 
of fluctuation is very small even when the ratio of spam- 
mers is 0.9. This shows that the reputations of honest 
users is decreasing with their error magnitudes. 

As discussed above, the correlation based ranking algo- 
rithm always gives lower reputations to the random rat- 
ing spammers, which decreases the influence of spammers 
on the the ranking result. At the same time, the repu- 
tations of honest users do not decrease significantly with 
the increase of spammers. It implies that the correlation 
based ranking algorithm can nearly remove the influence of 
spammers regardless of the ratio of spammers, and have a 
high robustness against the attack of random rating spam- 
mers. 
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significantly higher than the other two algorithm. 



Ratio of spammer 



Fig. 2: The effectiveness of different algorithms to random 
rating spamming. The result is obtained by averaging over 10 
independently run. 
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Fig. 3: The distributions of reputations of (a) spammers and 
(b) honest users with different ratio of spammers. The inset 
denote the relationship of honest user's reputation and his error 
magnitude. All the spammers are random rating spammers. 



Push rating spamming. The effectiveness of different 
algorithms with spammers who give push ratings is shown 
in Fig. m The AUG value of the correlation based rank- 
ing algorithm is only slightly higher than the other two 
algorithms when the ratio of spammers is high, but the 
value of r for the correlation based ranking algorithm is 




ratio of spammer 



Fig. 4: The effectiveness of different algorithms to push rat- 
ing spamming. The result is obtained by averaging over 10 
independently run. 

As push rating spammers are selected randomly, and 
every object has the same opportunity to get push ratings 
from the spammers, the result is that all object qualities 
calculated by IR or mean algorithm are higher than ex- 
pected. The simulation results imply that, this impact 
has a great influence on the value of r but small influence 
on the AUG value. A possible reason is that the ranking 
results of IR and mean algorithm have many local fluc- 
tuation comparing with the real ranking, and these local 
oscillations do not influence the AUG value. As the spam- 
mer always gives push ratings, its correlation coefficients 
with other vectors are always 0. The correlation based 
ranking algorithm can absolutely remove the influence of 
this kind of spammers. So the correlation based ranking 
algorithm has the highest robustness as indicated by either 
r or AUG. 

From the result discussed above we can conclude that, 
although the IR algorithm has the largest effectiveness 
for a clean system without spammer, it is clear that the 
correlation based algorithm has a good capability to resist 
spammer attack. 

Real data experiment. — After analyses with the 
artificial data, some real systems are studied in this sec- 
tion. We use two distinct real data sets containing 
movie ratings: Netflix and MovieLens. Movielens is pro- 
vided by GroupLens project at University of Minnesota 
(www.grouplens.org). We use their 1 million ratings data 
set given on the integer rating scale from 1 to 5. Each 
user in Movielens data set has at least 20 ratings. Netflix 
is huge data set released by the DVD rental company Net- 
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Table 2: Properties of the applied data sets. \U\ is the number 
of users, \0\ is the number of objects, ku is the mean degree 
of users, ko is the mean degree of objects, and sparsity is the 
sparsity of the data set. 



Data set 


1^1 


\o\ 


ku 


ko 


Sparsity 


Movielens 


6040 


3883 


166 


270 


0.0426 


Netflix 


4968 


16331 


242 


74 


0.0148 



flix for its Netflix Prize (www.netflixprize.com). We ex- 
tracted a smaller data set by choosing 4968 users who have 
rated at least 20 movies (just like Movielens) and took all 
movies they had rated. The Netflix ratings are also given 
on the integer rating scale from 1 to 5. The characteristics 
of these data set are summarized in table [2j 

As already explained above, one needs an independently 
selected set of so-called benchmark objects to test a rank- 
ing algorithm on real data. In our tests, we use movies 
nominated for the best picture category at the Annual 
Academy Awards, popularly known as Oscars (as a source 
of information we used www.filmsite.org), as benchmark 
objects. There are 203 benchmark movies in Movelens 
data set and 299 in Netflix data set. 

The AUG values of different algorithm on real data are 
shown in table [H For the Movielens data set, IR algorithm 
has the best effectiveness. While for the Netflix data set, 
the correlation based algorithm has the best performance. 
It is obviously that the AUG values for Movielens using all 
the three algorithms are obviously higher than that of Net- 
flix (range from 0.8723 to 0.8763 for Movielens, 0.7609 to 
0.7742 for Netflix), and this may suggest that the Netflix 
data set includes more spammers than the Movielens data 
set. Thus based on the results of artificial data, it is sug- 
gested that the correlation based ranking algorithm obtain 
better result for Netflix than IR algorithm just because the 
correlation based ranking algorithm is more robust against 
spammer attack than IR algorithm. 

Conclusion and discussion. — It is a big challenge 
to get the right ranking of objects in such rating systems 
where user vote objects with rating scores, especially when 
spammers are present in the rating system. When it comes 
to the user reputation system, how to decide a user's trust 
value is a crucial question. As correlation is a good way 
to describe the similarity between two vectors, we choose 
correlation coefficient to represent user's reputation and 
use iterative method to obtain the result step by step. 
According to the artificially generated data, the presented 
correlation based ranking algorithm has a good effective- 
ness to resist the attack of spammers. In testing with real 
data, the present algorithm has a higher effectiveness than 
IR algorithm for Netflix, but lower effectiveness for Movie- 
lens. That may suggest that Netflix data set includes more 
spammers than Movielens, and the present algorithm has 
higher robustness to spammers' attack than the other two 



Table 3: AUG values of different algorithms for the real data 
sets. 



Algorithm 


Mean 


IR 


Gorrelation-Based 


Movielens 


0.8730 


0.8763 


0.8723 


Netflix 


0.7609 


0.7650 


0.7742 



algorithms. 

A good ranking algorithm should be both robust and 
accurate. The correlation based algorithm presented in 
this paper can more effectively tackle the problem of ro- 
bustness than the others. For the accuracy, there is still 
a large room for improvement. On the other hand, how 
to judge the ranking result is also a problem. For movies, 
some of them which have not received any award are also 
widely loved by people. Only using movies that have been 
nominated by famous award as benchmark is also not rea- 
sonable. The effectiveness of ranking algorithm with arti- 
ficial data is easy to evaluate. If real data are completely 
replaced by artificial data, it will be easier to evaluate a 
given ranking algorithm. Our future work will focus on 
how to build more reasonable models to generate artificial 
data and improve the accuracy of ranking algorithm. 
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